Skip to main content

Full text of "BSTJ 57: 4. April 1978: Loop Plant Modeling: Statistical Analyses of Costs in Loop Plant Operations. (Dunn, D.M.; Landwehr, J.M.)"

See other formats


Copyright © 1978 American Telephone and Telegraph Company 

The BellSystem Technical Journal 

Vol. 57. No. 4. April 1978 

Printed in U.S.A. 



Loop Plant Modeling: 



Statistical Analyses of Costs in Loop Plant 
Operations 

By D. M. DUNN and J. M. LANDWEHR 

(Manuscript received October 10, 1977) 

The Serving Area Concept {SAC) involves a new procedure for the 
design and administration of the loop plant to reduce operating costs. 
Two major problems facing a loop plant engineer considering conver- 
sion to SAC are determining which areas should be converted (and in 
what order) and assessing the savings resulting from the conversion. 
This paper presents methodology and data analysis results useful for 
solving such problems. The data analyzed are from the Prototype 
District and measure a large number of facility related problems both 
before and after conversion to SAC. A cost penalty measure, based on 
observed facility problems, is calculated for a given area using data 
collected in that area over a certain period of time. The before con- 
version data are characterized and modeled in order to quantify the 
uncertainty, in the form of a confidence interval, associated with this 
cost penalty. Confidence intervals are useful to decide appropriate sizes 
for the data collection areas, appropriate lengths of time for data col- 
lection, as well as for comparing the results between two or more areas. 
The effect of conversion to SAC on the cost penalty measure is also ex- 
amined. It is found that after conversion costs are much lower than 
before conversion costs, but that costs continue to decrease for at least 
9 to 12 months after conversion takes place. The analysis and results 
presented here yield methods and guidelines to be used for data col- 
lection and analysis in other districts. These can help in reliably 
choosing areas for conversion to SAC which will maximize savings. 



965 



I. INTRODUCTION AND SUMMARY 

Investment decisions in the loop plant, like most such investment 
decisions in the Bell System, are dependent on careful analyses and the 
data which underlie these analyses. This paper describes detailed studies 
of a large body of data measuring several kinds of loop plant operations 
and costs. The cost measures used are based on the Facility Analysis 
Plan for Outside Plant (FAP); this plan, described and discussed in Ref. 
1, gives methods for managing the loop plant. The results of this paper 
contain guidelines for the use of certain FAP measures, as well as insights 
into related characteristics of the data. 

The data analyzed here are from the Prototype District Project, 2 a 
major effort undertaken to analyze those operating costs of a district that 
can be controlled by changes in the design or administration of the loop 
network. This involved a nearly three year study of the Passaic District 
of New Jersey Bell Telephone Company. Passaic is an urban area with 
some small business, scattered apartments, and large old houses. Many 
sections were converting from single- to multiple-family dwellings. Much 
of the existing loop plant was congested and had maintenance problems. 
Thus, conversion to the Serving Area Concept (SAC) 3 was considered 
appropriate for much of the district. This conversion involves departures 
from dedicated plant design and multipled plant design. 4 Serving area 
interfaces, which are basically large boxes containing cable pair inter- 
connect points, are installed in appropriate places in the network. Then 
cable pairs are permanently connected from the interface to the cus- 
tomer, and complements of feeder pairs from the central office to the 
interface are supplied as needed. The Facility Analysis Plan, developed 
from the Prototype District Project, gives methods for determining when 
and where conversion to SAC is appropriate. 

The Prototype District Data Base 5 is the key to tracking district ac- 
tivities. Each month over 50,000 measurements of district operations 
involving facility related problems were recorded. (Many of these mea- 
surements were zero.) Data were retained by 50-pair complement by 
month for that part of the district undergoing extensive conversion to 
serving areas. Data are available from April 1973 through December 
1975. 

There are many procedures in the Facility Analysis Plan to aid in 
understanding costs and potential savings in the management of loop 
plant. Among the concepts involved are allocation areas, 1,4 which are 
geographical regions used for tracking operating costs and cable usage. 
Allocation areas are also basic units of plant for planning additions or 
changes in the network such as conversion to SAC. Therefore, in order 
to trigger the need for treatment of the network these areas are initially 
ranked on the basis of facility problems in each area. This ranking is 
based on a weighted linear combination of facility problems normalized 

966 THE BELL SYSTEM TECHNICAL JOURNAL, APRIL 1978 



by the number of assigned pairs in the area. The weights are costs asso- 
ciated with the individual problem items and together yield a "Cost 
Penalty Per Assigned Pair" (CPPAP). In Ref. 1, the Normalized Yearly 
Marginal Operating Cost, which is a generalization of CPPAP, is used as 
a basis for their discussion. Other cost calculations include the "Plant 
Stabilization Analysis Form" and the "CUCRIT" analysis to compute the 
rate of return associated with a given investment strategy. While these 
other cost calculations are important and relevant to FAP, the focus of 
this paper is on the CPPAP calculation and its component parts. 

Three specific reasons motivate the choice of CPPAP for analysis here. 
First, it is the initial form used to analyze data in FAP and as such holds 
an important position. Second, the cost calculations for CPPAP are linear 
combinations of observed quantities and hence directly interpretable. 
Third, CPPAP does not require any special factors (e.g., "improvement 
factor") as are needed in most of the other measures. 

The general purpose of this paper is to give insight into facets of these 
data relating to the conversion of selected allocation areas to SAC which 
took place during the Prototype District Project. Two important prob- 
lems to the loop plant engineer are to determine which of the allocation 
areas should be converted (and in what order), and to assess the savings 
resulting from the conversion. The data analysis addresses these prob- 
lems by modeling the variability of the FAP data. The uncertainty as- 
sociated with projected savings is found to decrease as the serving areas 
become larger (in assigned pairs) and the data collection period in- 
creases. 

An exploratory analysis of the before, during, and after conversion 
cost measure and its components (Section II) shows that the cost mea- 
sure varies widely both across areas and time. Assignment changes, cable 
troubles, and defective pairs contribute the most to the level and vari- 
ability. A detailed statistical analysis of the before conversion cost data 
in Section III is used as a basis to develop confidence intervals (Section 
IV) on the "true" cost penalty. These intervals quantify the uncertainty 
associated with an observed cost penalty for a given area. They are useful 
to decide appropriate sizes for the data collection areas, appropriate 
lengths of time for data collection, as well as for comparing the results 
between two or more areas. Moreover, confidence intervals show the 
trade-off between the size of the data collection area and the data col- 
lection period. 

Finally, the effect of the conversion on the cost measure is examined 
in Section V. A regression equation is developed which models the after 
conversion costs in terms of before and during conversion variables as 
well as the time since conversion. The major result shows that costs 
continue to decrease after conversion takes place. In order to get an 
adequate measure of the savings associated with conversion to SAC, one 
must collect data for at least nine to twelve months after conversion. 

STATISTICAL ANALYSES OF COSTS 967 



It should be noted (before proceeding with the data analysis) that 
much of the work described was also performed on other savings mea- 
sures including the rate of return. The same techniques which are shown 
for CPPAP were found useful, but for brevity their results are not 
shown. 

II. GENERAL CHARACTERISTICS OF THE COST DATA 
2.1. Introduction 

The purpose of this section is to give some insight into the data used 
in the further analyses in this paper. As described above, the analysis 
focuses solely on the data in the CPPAP, which is calculated using the 
"Allocation Area Problem Ranking Worksheet." 1 This worksheet is 
shown in Fig. 1. Column B, the cost factors, are specific to the Prototype 
District, but they are also representative of other loop plant districts. 
Abbreviations used in Fig. 1 and throughout this section are as follows: 
LST — line and station transfer; WOL — wired out of limits; BCT — break 
connect-through; CDP — clear defective pair; BPC — break permanent 
connection; CIR — control point interconnection; RE — referred to engi- 
neer; RTC — reterminated connection; AC-SOD — assignment change 
because the originally assigned pair from a service order was found to 
be defective; AC-NS — non-service-order assignment change; AC-OTH — 
other assignment change; FCT-7AB — 7A or 7B cable trouble associated 
respectively with splicing and terminating troubles; FCT-OTH — other 
cable trouble; DEF PRS — defective pairs. For definitions and discussion 
of these and other loop plant terms, see Ref . 4. 

Two of the items on the worksheet were not measured directly in the 
data base. They are the BCT and RTC. However, based on engineering 
studies in the Prototype District 6 it was determined that these could be 
adequately approximated for the Prototype District during the study 
period by a fraction of the total facilities assigned, which is measured 
in the data base. These studies determined that BCTs were 13 percent 
of the facilities assigned and that RTC were 35 percent of facilities as- 
signed. Finally, the management of the loop plant used in the Prototype 
District was such that there were no CDP, BPC, or CIR. Therefore, in all 
further analyses these cost factors are ignored. All other variables, except 
the number of defective pairs, are available (monthly) in the data base. 
Defective pairs were entered annually from the district's yearly pair 
status report. This report gives the pair status (e.g., assigned, defective, 
etc.) as of January 1 and is used monthly for the twelve month period 
centered at January 1 (i.e., July through June). Thus, the data to be 
studied in this section are the monthly values of the CPPAP and the 11 
sub-components of CPPAP that were either measured or estimated during 
the study. 

968 THE BELL SYSTEM TECHNICAL JOURNAL, APRIL 1978 



AA 



DATE OF RANKING 





A 


B 


C 


L 

1 
N 

E 

# 


ITEM 


ENTRY 


COST 
FACTOR 


COST 
PENALTY 


1 

2 

3 

4 

5 

6 

7 

8 

9 

10 

11 

12 

13 

14 


LST 

WOL 

BCT 

CDP 

BPC 

CIR 

RE 

RTC 

AC • S. 0. Def 

AC - Non S. 0. Def 

AC - Other 

FCT - 7A, B 

FCT - Other 

Def Prs 


#/YR 


DIST 

CO 

KFT 


X 17.52 
X 36.81 
X 7.64 
X 72.70 
X 24.64 
X 70.55 
X 35.15 
X 9.48 
X 29.35 
X 68.14 
X 32.63 
X 83.32 
X 109.00 


m 


#/YR 
#/YR 
#/YR 


. 


M 


- 


#/YR 
#/YR 
#/YR 
#/YR 


a 


= 


- 


- 


#/YR 
#/YR 
#/YR 
#/YR 
#/YR 
#DEFPrX 


m 


= 


- 


. 


- 


X 0.91 


= 






15 


TOTAL COST PENALTY (SUM 1 TO 14) 






16 


COST PENALTY PER ASSIGNED PAIR 










# ASSIGNED PAIRS 


-[ 


LINE 15 





Fig. 1 — Allocation area problem ranking worksheet. 

2.2. Components of cppap 

The CPPAP has 11 non-zero cost components. However, two of those 
variables are perfectly correlated since they are both proportions of the 
facilities assigned (i.e., BCT and RTC). Therefore, since both the cost 
factor (see Fig. 1) and the proportion of facilities assigned associated with 
the RTC is higher than that for BCT, it is the RTCs which will be used in 
the further analyses in this subsection. In later sections of the paper all 
components are used in the calculation of CPPAP. 

A numerical summary of the level (mean) and variability (standard 
deviation) of the ten cost components for each of the three stages of area 
conversion is given by Table I. So that a few extreme data values do not 
overwhelm the rest of the data, the 25 percent trimmed mean and 
standard deviation were used. Thus these values are based on only the 



STATISTICAL ANALYSES OF COSTS 969 



Table I — CPPAP component costs for 10 converted areas 







Trimmed mean 




Trimmed std dev 


Variable 


Before 


During 


After 


Before 


During 


After 


LST 


0.65 


0.12 


0.0 


0.77 


0.17 


0.0 


WOL 


0.04 


0.0 


0.0 


0.18 


0.0 


0.0 


RE 


0.77 


0.04 


0.0 


1.20 


0.19 


0.0 


RTC 


0.53 


0.40 


0.21 


0.22 


0.14 


0.09 


AC-SOD 


0.26 


0.10 


0.01 


0.38 


0.18 


0.12 


AC-NS 


1.44 


3.78 


0.46 


1.16 


3.35 


0.52 


AC-OTH 


1.12 


1.47 


0.44 


0.56 


1.41 


0.41 


FCT-7AB 


2.30 


4.97 


0.21 


1.36 


4.50 


0.61 


FCT-OTH 


0.19 


0.13 


0.0 


0.82 


0.39 


0.0 


DEF PRS 


0.80 


0.84 


0.85 


0.63 


0.83 


0.89 



middle 50 percent of the data. First the trimmed mean across months 
for each area in each stage of conversion was computed; the tabled values 
are the trimmed mean and trimmed standard deviation of those values 
across the 10 converted areas. Focusing on the mean (level) values first, 
it is clear that the dollar costs shown in the table vary widely from 
component to component as well as for the stages of conversion. Perhaps 
the most remarkable change is in the non-service-order assignment 
change tickets (AC-NS) which go from $1.44 before to $3.78 during to 
$0.46 after. However, considering the physical situation, this type of 
behavior is to be expected. During the conversion, many of the cable pairs 
are being handled by the nature of the design of an allocation area. This 
can cause many of the pairs to become defective and can cause an in- 
terruption in the customer's service. The service is restored either by 
changing the customer to a new pair (recorded as an AC-NS) or actually 
fixing the defective pair (recorded as an FCT-7AB). Note further that the 
occurrences of splicing and terminating cable troubles (FCT-7AB) also 
peak during conversion and fall to greatly reduced levels in the after 
period. However other cable troubles (FCT-OTH) contribute little to 
CPPAP. The category of assignment changes due to the originally as- 
signed pair from a service order being defective (AC-SOD) drops to very 
nearly zero after conversion. Other assignment changes (AC-OTH) is a 
major contributor to CPPAP during all three periods of conversion. The 
LST, WOL, and RE after conversion all have zero trimmed mean and 
standard deviation. The category of defective pairs (DEF PRS) is inter- 
esting because its level stays the same from during to after, and its 
variability actually increases during this transition. However, since the 
defective pair data is only updated annually, these results should be 
considered preliminary. More detailed special studies of defective pair 
rates have been performed and are included in Ref. 2. 

While the table is a helpful summary of overall behavior, it is not useful 
in trying to characterize the similarity and differences among the areas 
with regard to the components of CPPAP. Graphical displays of multi- 
variate data are often useful for gaining insight into the basic structure 

970 THE BELL SYSTEM TECHNICAL JOURNAL, APRIL 1978 



of data. However, they tend to become more complicated and less useful 
as the number of variables increases. Based on Table I, it seems fairly 
clear that most of the interesting (large and variable) dollar components 
of CPPAP are found in the assignment changes, the cable troubles, and 
the defective pairs. The costs associated with LSTs, WOLs, REs and RTCs 
tend to be both small and fairly stable. Therefore, in the graphical dis- 
plays the focus will be on the six largest and most diverse cost compo- 
nents. 

Figure 2 gives one example of a polygon plot 7 for three of the converted 
areas and the mean converted area (i.e., the 25 percent trimmed mean 
of the converted areas). The polygon is formed by connecting the value 
of each variable plotted on its respective axis (see Fig. 2 key). By exam- 
ining the polygons associated with different areas and stages of con- 
version it is possible to visually compare and contrast characteristics of 
the areas. Note the similarity of the areas for before, during, and to some 
extent after. The values in these plots are as in Table I, and show dollar 
amounts. The scaling is designed to show most of the variability in these 
data without being distorted by a few very large values. Although areas 
of a polygon do not directly correspond to the total cost associated with 
an allocation area, areas do give some idea of that sum. For example, it 
is clear that after conversion the cost penalty is very small compared with 
during and before. The anomalous large value of the non-service order 
assignment changes (mentioned earlier) is evident in the during period. 
The peak on the first axis from the vertical position is this large 
value. 

2.3. Analysis of CPPAP 

To achieve an initial feel for the nature of the CPPAP data, a plot of 
these values against time for the individual allocation areas is useful. 
Figure 3 shows a sequence of four allocation areas for their entire 33 
month data history. Note that the vertical scales on the four plots, which 
show dollar cost penalties, are different. While such differences make 
across area comparisons difficult, the range of the data (particularly 
including converted and non-converted areas) is so large that using a 
single scale would obscure much of the available detail. Because there 
is a good deal of variability in the CPPAP measure, a non-linear (resistant) 
smoother is applied to the data and plotted (as the solid line) along with 
the raw values. The resistant smoother used is (3RSR), twice. 8 Since this 
smoother is based on moving medians, rather abrupt changes may occur 
in the smoothed output. This smoother was selected for just this reason 
so that rapid changes in the level of the data (e.g., after conversion) would 
not be obscured. 

Of these four allocation areas (212 through 215), two were eventually 
converted (213 and 214), while the other two were not. For those areas 
which have been converted, lines are drawn to indicate the end of the 

STATISTICAL ANALYSES OF COSTS 971 



ALLOCATION AREA 213 



;-f' 



BEFORE 




DURING 



AFTER 



ALLOCATION AREA 222 



#• 



BEFORE 




DURING 



ALLOCATION AREA 231 




BEFORE 



-v : - 



DURING 



MEAN AREA 




.1?- 



AFTER 



^ 



AFTER 



X 



MEAN-B MEAN-D MEAN-A 

(a) 

Fig. 2— (a) Components of CPPAP (radius length 7.12). (b) Key. 

before conversion period, and the beginning of the after conversion pe- 
riod. Note that these vertical lines are drawn between actual monthly 
observations. The data accuracy only allows full month designations of 
before, during, or after. For example, in area 214 months 1-5 are before, 
months 6-13 are during, and months 14-33 are after conversion. 

Analysis of this figure (and others) showing all the area-time histories 
gives a considerable amount of insight into the nature of the data. 

(i) The CPPAP for the areas where there is no conversion tends to 
be more stable than for areas that undergo conversion. 



972 THE BELL SYSTEM TECHNICAL JOURNAL, APRIL 1978 



AC-SOD 



DEF PRS 




AC-NS 



FCT-OTH 



AC-OTH 



Fig. 2. (continued) 

(ii) Fairly large excursions from a smooth value are evident for all 
areas. (Note that the resistant smoother is not affected by these unusual 
excursions.) 

{Hi) The level and variability of the before, during, and after may be 
quite different. 

(iu) The after conversion behavior of these areas is quite different. 
For example, in area 213 the CPPAP drops quickly to a value near zero. 
In area 214 there is a slow but steady decline to a near zero value for 
CPPAP. 

(v) No evident seasonal pattern is visible in this limited amount of 
data. 

Table II shows a basic summary of the behavior of each of the 10 
converted areas for before, during, and after conversion months. The 
25 percent trimmed mean and standard deviation are used, as in Table 
I, so that the tabled values reflect the bulk of the data. Table II shows 
that both the level and variability change during the "life" of an area. 
The during period tends to have the highest levels. The after is the lowest 
(as would be both expected and presumed because the effect of con- 
version is to reduce the occurrence of the costly plant troubles) both in 
level and variability. The variability of the before conversion data is quite 
high and not uniform across areas. 

In summary, based on these and similar displays, CPPAP values appear 
to vary quite widely both across allocation areas and stages of conversion. 
For those areas which were converted, the level and variability of the 
individual components of CPPAP tend to be concentrated in the as- 
signment changes, cable troubles, and defective pairs. 



STATISTICAL ANALYSES OF COSTS 973 




ALLOCATION AREA 212 



^T~^ 



x x 



X 



10 



15 20 

MONTHS 



25 30 

ALLOCATION AREA 213 




15 20 

MONTHS 

Fig. 3 — Rough and smooth CPPAP for various allocation areas. 

III. EXPLORATORY AND GRAPHICAL ANALYSES OF BEFORE 
CONVERSION DATA 

3.1. Motivation 

One use of the Prototype District Data Base is to develop methods of 
analysis for determining which allocation areas in other districts should 
be converted to SAC, using FAP techniques. Related questions concern 
how many months of data should be collected before making such de- 
cisions, and how large the areas should be in the first place so reliable 
decisions can eventually be made. This section explores certain prop- 
erties of these data, motivated by these goals. Fluctuations in the cal- 



974 THE BELL SYSTEM TECHNICAL JOURNAL, APRIL 1978 



Table II — CPPAP for 10 converted areas 







Trimmed mean 






Trimmed std dev 




Area 


Before 


During 


After 


Before 


During 


After 


209 


4.39 


6.68 


1.90 


2.84 


5.20 


1.10 


210 


5.97 


22.32 


3.51 


6.52 


10.98 


2.22 


213 


8.84 


32.56 


3.35 


3.40 


31.65 


3.25 


214 


11.43 


11.81 


4.18 


14.75 


7.73 


5.98 


221 


9.58 


10.29 


1.91 


6.86 


9.58 


0.00 


222 


9.41 


11.64 


4.01 


3.02 


5.03 


0.00 


227 


10.37 


10.07 


6.67 


6.85 


4.30 


4.95 


228 


11.73 


14.97 


4.22 


4.91 


7.61 


1.19 


229 


14.15 


17.00 


3.02 


10.86 


22.88 


0.93 


231 


21.10 


17.03 


3.32 


7.52 


7.28 


2.04 



culated cost penalty across months and across areas can be large, as was 
seen in Section III. Thus, statistical methods are needed to help answer 
these questions. Since only before conversion data could be used to help 
in making decisions regarding conversion, only the before data from the 
data base are considered here. The analysis uses the cost measure CPPAP 
for reasons described in Section I. 

The goal here is to examine the structure of the before conversion data 
so as to be led to reasonable methods of analysis (i.e., reasonable as- 
sumptions and models) to answer these questions. We concentrate on 
searching for and examining certain relationships by studying appro- 
priate scatter plots. While certain numerical statistics are also useful for 
such purposes, an advantage of plots is that they are more exploratory 
in nature. Section IV then presents and uses a specific model, supported 
by the data, as a way of answering the questions in the previous para- 
graph. 

3.2. Analyses 

For the following plots, consider the cost penalty x j; for area i and 
month;. The mean, x it and standard deviation, s„ of these values across 
months for each area were calculated. Only before conversion data were 
used, so the number of months differs from one area to another; however, 
recall that 13 of the 23 areas were never converted, so for these areas all 
33 months are available. Figure 4 plots the standard deviation s, vs. the 
average cost penalty x, for all 23 areas. A positive relationship between 
these two quantities is very clearly apparent. Such a relationship strongly 
violates assumptions that would be desirable and convenient to use. 

Another look at this relationship can be obtained by considering the 
sizes of the 23 areas. Since the cost penalty x'y is itself an average cal- 
culated over the number of pairs in the area (cost penalty per assigned 
pair), one might expect the standard deviation of these values, s„ to be 
smaller the larger the size of the area. Figure 5 plots s, vs. the number 
of assigned pairs in the ith area, p,-. From theoretical grounds one might 
expect the relationship between s and p to be of the form s = afVp , for 

STATISTICAL ANALYSES OF COSTS 975 




20 



5 10 15 

ALLOCATION AREA MEAN (x,) 

Fig. 4 — CPPAP values before conversion. 

some a. The points in Fig. 5 look like they might generally follow a re- 
lationship like this, plus some scatter. Thus, we fit a curve s = a/y/p to 
these points using least squares* and then formed the residuals (s; — s/). 
Each residual is plotted against the corresponding x, in Fig. 6. Again a 
strong increasing relationship is apparent; the larger the average cost 
penalty x,- for an area, the more likely it is that (s, — s,) is positive and 
large. Even after removing the effect of area size from the standard de- 
viation Si, higher area averages x t are associated with higher area stan- 
dard deviations s,. 

One approach to answering the questions put forth in Section 3.1 
would be to fit an appropriate linear statistical model to these data, and 
then make inferences from that model. However, one of the assumptions 
underlying the usual fitting of such a model is that of homogeneity of 
variance; i.e., the variance of the observations should be constant across 
different levels of other variables. Because of the relationships seen 
above, it is worthwhile also to consider transformations of CPPAP when 
exploring the before conversion data. Some transformed variable quite 
possibly could be generally appropriate for later, more formal analysis 
than would the raw CPPAP values. 



* Weighted least squares were used, for reasons described below. 



976 THE BELL SYSTEM TECHNICAL JOURNAL, APRIL 1978 




200 400 600 800 1000 1200 1400 1600 1800 

ASSIGNED PAIRS (/>/) 

Fig. 5 — CPPAP values before conversion. 

Several transformations of the cost penalties within the family y = 
(x + a) b , with a and b specified parameters, were calculated and studied. 
Considering the results as a whole, the most satisfactory and interesting 
properties appeared using the transformation y = \n(x + 1), which 
corresponds to b = 0, with x the CPPAP as before. Thus the following 
plots in this section were all constructed using this transformation. 

Figure 7 plots the standard deviation (sy),- vs. y,-, with the plotting 
character showing the size of the area; "1" for areas with assigned pairs 
Pi < 500; "2" for 500 < p,- < 700; "3" for 700 < p,- < 950; "4" for p, > 950. 
There appears to be no systematic relation between (sy) and y, although 
the two extreme (high and low) values on y possibly suggest a decreasing 
trend; certainly there is nothing like the behavior in Fig. 4. Moreover, 
the higher number plotting characters tend to be at the bottom of the 
plot with the lower numbers at the top, implying that larger areas have 
smaller variability, apart from their average value. The area average y, 
is plotted against size p, in Fig. 8; these quantities appear unrelated, so 
knowing a priori the size of an area does not enable one to say much 
about its expected average cost penalty. 

Figure 9 shows the standard deviation (sy); plotted against size p,. 
There is a downward trend, and one expects larger areas to have smaller 



STATISTICAL ANALYSES OF COSTS 977 











X 




4 


- 










2 


- 




X 

X 

X 

X 


x 
x 


X 









"4c 

x ** 

x x 
x x x 






-2 




X 


X 
X 






-4 






1 1 


, 


1 



5 10 15 20 25 

ALLOCATION AREA MEAN (JT/1 

Fig. 6 — CPPAP values before conversion. 

variability. To see to what extent this trend is accounted for by a sy = 
a/y/p relationship, a was obtained by a weighted least squares regression 
of (sy)i on l/y/~pi\ the fitted curve is the solid line in Fig. 9. A weighted 
regression was used because the variances of the individual points {sy)i 
about their expectations t; = o/y/pi depend on the values of r t and m„ 
the number of months of before data for that area; assuming normality 
of the y's, the variance is 0.5'T 2 /(/n,- — 1). (This is derived from the x 2 
distribution associated with (sy) 2 .) Thus, weights proportional to the 
reciprocal square roots of these variances were used, and the following 
three plots are the raw residuals multiplied by these weights. 

The residuals syi — s^i are plotted against y, in Fig. 10. No strong 
relationship is apparent. Perhaps the points with extremely high and 
low y suggest a downward trend, but if these single points are ignored 
no structure at all remains. Figure 11 plots each residual against m;, the 
number of months of before data for that area. One would like to see a 
horizontal band, which would signify no relationship; indeed, the plot 
does not suggest any strong relationship. A normal quantile-quantile 
probability plot 9 of the residuals is displayed in Fig. 12. This shows 
reasonably good normality of the residuals, although the largest value 
is somewhat larger than would be expected and there is some bunching 
of the residuals, for which we have no explanation. 



978 THE BELL SYSTEM TECHNICAL JOURNAL, APRIL 1978 



u.o 


















1 -TINY 




















2 -SMALL 








1 












3- MEDIUM 
4 -LARGE 


0.7 


- 


















ir 










1 










z 








2 








1 




O 




















1- 












1 








|0.6 


_ 


















> 




















lit 




















a 
















1 




o 


2 


















BE 












2 








< 
















2 




Q 




4 1 


1 






2 








| 0.5 






3 














H 




















C/l 




















< 










3 


3 








DC 




















< 




















O 0.4 




















1- 




















< 

o 
o 






4 






2 








_l 
















4 


3 


< 




















0.3 
09 




1 


I 


4 




4 


4 


1 





1.0 



2.0 2.5 

ALLOCATION AREA MEAN (/,-) 



3.0 



3.5 



Fig. 7 — Values of ln(CPPAP + 1) before conversion. 

Thus, for the logarithmic transform of the original cost penalty nicer 
behavior results than with the raw variable. An area's standard deviation 
is unrelated to its level, but it is related to its size in a reasonable way; 
moreover, the residuals from this relationship have reasonable proper- 
ties. A number of additional properties of these data were explored, but 
to conserve space only a few will be discussed in any detail. 

For each month, the mean and standard deviation of the CPPAP values 
for all allocation areas for that month were calculated. Figure 13 plots 
the monthly standard deviations vs. the monthly average, again using 
y = ln(CPPAP + 1). There are 33 points in the plot, one for each month; 
of course the points from later months are based on successively fewer 
values as areas are converted. No relationship is apparent; this is con- 
sistent with the lack of relationship between standard deviation and 
mean as calculated for each area in Fig. 7. The monthly average vs. the 
month number and the smooth of these data [using 4(3RSR)2, twice, a 
non-linear smoother 8 ], are shown in Fig. 14. This suggests somewhat of 
a cyclic behavior in the average cost penalty. Local peaks appear around 
months 1-2, 12-14, and 26-28. One might hypothesize the existence of 
a cyclic 12-month structure to these data due to seasonal local factors 
such as weather, churn, and inward and outward movement. However, 



STATISTICAL ANALYSES OF COSTS 979 




200 



600 800 1000 1200 

ASSIGNED PAIRS (p,) 



1800 



Fig. 8 — Values of ln(CPPAP + 1) before conversion. 

Fig. 14 does not show such clear behavior that one could extrapolate some 
fitted cycle with any confidence. Moreover, recall that the purpose of 
these analyses is to develop methods that could be used with (probably 
less extensive) data from other districts for decision making. We would 
not want to extrapolate a specific seasonal pattern from Fig. 14 to a new 
district without careful consideration of similarities and differences 
between the new district and the Prototype District. One might, though, 
wish to use 12 or 24 months data when arriving at decisions so as to re- 
move seasonal effects. The possible seasonal factor is discussed further 
in reference to somewhat different purposes in Section V. 

Distributional characteristics and the correlation structure of the 
transformed observations can also be of interest. Figure 15 gives a normal 
quantile-quantile plot of (yy — y».)"Vpf for all areas i and months; be- 
fore conversion. This quantity is of interest because some differences 
between areas are expected, but can be removed by looking at the de- 
viations yy — yi.. No strong monthly effect was seen above, so that pos- 
sibility is ignored here; and also it was found earlier that var(y, ; ) is ap- 
proximately a 2 / Pi, so the values (y t; - yi.Yyfp~i should have approxi- 
mately equal variance. Figure 15 shows that these values are distributed 
reasonably closely to the normal distribution. 



980 THE BELL SYSTEM TECHNICAL JOURNAL, APRIL 1978 




200 



400 



1400 



1600 



1800 



800 1000 1200 

ASSIGNED PAIRS (p,). 

Fig. 9 — Values of ln(CPPAP + 1) before conversion. 

Turning to the possible relationships between areas, a different normal 
quantile-quantile plot, calculated from correlations in the following way, 
is given in Fig. 16. For each pair of areas k and /, the correlation between 
the above (yy — yu)*/pi , i = k and /, was calculated over the before 
conversion months common for both areas. This gives 253 (= 23-22/2) 
estimated correlations, and we would like to see to what extent these 
differ from a random sample of correlations where the true correlation 
coefficient is 0. Fisher's z transformation, 

1+7" 



-w(l^ 



was used to achieve approximate normality. If the population correlation 
is 0, then mean (z) « 0, 

(n + 1) 

where n is the sample size and z is approximately normally distributed. 
For these data each z was divided by the standard deviation corre- 
sponding to the number of months n from which it was calculated, and 
Fig. 16 is a normal quantile-quantile plot of the standardized z's. A 



STATISTICAL ANALYSES OF COSTS 981 













X 






<i? 


0.1 














> 

UJ 

Q 








X 


X 
X 
X 






H 
to 

n 






X 




X X 
X X 






li- 

O 
cc 

U- 

co 

< 

Q 
t/i 

UJ 

IT 




-0.1 

-07 






X 

I 


X X 

X 

x x x 

X 

X 

I I 


X 


X 

I 



1.0 



1.5 2.0 2.5 

ALLOCATION AREA MEAN (/,•) 



3.0 



Fig. 10 — Values of ln(CPPAP + 1) before conversion. 

"perfect" result would have all points on the v = x line, which is drawn 
on the plot. However, even if the true correlation were one would not 
necessarily expect our standardized z's to scatter exactly about this line 
since we do not have 253 correlation coefficients calculated indepen- 
dently of one another. Instead they are formed pairwise from 23 vari- 
ables, implying some (complicated) structure among them. In Fig. 16 
the points are uniformly above, but quite close to the y = x line; the 
standardized z's are slightly but consistently larger than would be ex- 
pected if all true correlations were 0. The median of the standardized 
z's corresponds to a population correlation of about 0.3. Thus there is 
evidence of a positive but not large correlation between the values in 
different areas at the same point in time. This result is not intuitively 
unexpected since geographic proximity is probably the cause. For ex- 
ample, a heavy rainstorm may increase cable troubles and hence larger 
values of CPPAP. A more exhaustive exploration of the correlation 
structure of these data could also consider correlations both between 
and within areas at different points in time, i.e., with leads and lags. 

Another plot of some interest, Fig. 17, shows y, vs. the distance of each 
area from the central office, d,-. Although one might or might not expect 
such a relationship, the data strongly suggest that areas further from the 



982 THE BELL SYSTEM TECHNICAL JOURNAL, APRIL 1978 



















X 


1 






X 




X 




X 


X 
X 
X 

X 







X 
X 


X 
X 


X 




X 




X 
X 


-0.1 


- 










X 




X 
X 


-0.2 




1 


1 






I 







5 10 15 20 25 30 

NUMBER OF MONTHS OF BEFORE DATA FOR AREA (m,l 

Fig. 11— Values of ln(CPPAP + 1) before conversion. 



central office have higher cost penalties. It would be of interest to have 
explanations for this and to see if this relationship generalizes to other 
districts. Such investigations are in progress by the authors and others. 
However, as with the possible monthly cycle seen above, we would not 
necessarily want to extrapolate this in a straightforward way to other 
districts. It is also of interest to consider the plot of the weighted residual 
(syi — syi) vs. d„ given in Fig. 18. Although the area average may be re- 
lated to d„ Fig. 18 shows that the part of the standard deviation not 
predicted from the size of the area does not seem related to d,. This latter 
result fits in with the previous discovery that the standard deviations 
of the y's do not appear to be systematically related to anything except 
the size of the area. 

The entire set of plots and analyses described in this section were re- 
peated using robust estimates of location and scale instead of the sample 
mean and standard deviation. The purpose was to see if a small number 
of deviant observations might be either causing, or hiding, the rela- 
tionships considered above. However, there was no appreciable differ- 
ence in the results. The results using the mean and standard deviation, 
rather than the more robust statistics, were presented above because of 
the widespread familiarity and use of these statistics. 



STATISTICAL ANALYSES OF COSTS 983 



<$ 



-0.1 



-0.2 



XX 



X XX 



XX 



NORMAL PROBABILITY PLOT 
23 POINTS ON THE PLOT 



-3.5 



-2.5 



-1.5 



-0.5 0.5 

THEORETICAL QUANTILES 



Fig. 12 — Values of ln(CPPAP + 1) before conversion. 

The analyses were also repeated using other cost measures. As in the 
case of CPPAP, for each of these measures some transformation of the 
original values was discovered which appeared more useful for inter- 
pretation and later analysis than was the raw cost measure. 

IV. DATA COLLECTION GUIDELINES 
4.1. General results 

This section makes use of the results from the previous section to 
construct guidelines for the collection period and size of future allocation 
areas. These guidelines are in the form of confidence intervals for the 



984 THE BELL SYSTEM TECHNICAL JOURNAL, APRIL 1978 



0.80 




2.0 2.2 

MONTHLY MEAN 



Fig. 13 — Values of ln(CPPAP + 1) before conversion. 

"true savings" given estimated savings, size of area, and the number of 
months of data collection. In addition, methods are presented for ex- 
tending these results to local areas with characteristics different from 
those of the Prototype District. 

Based upon the data analysis of Section III, it is reasonable to use the 
following model and analysis. Lety, ; = ln(CPPAP + 1) be the transformed 
cost measure for area i and month j. Express this as 



yu = m; + en 



(1) 



where Mi is the "true transformed CPPAP" for this area and ey is the 
"error" term corresponding to this month. We wish to make inferences 
about the area values m and differences m — hj. 

Consider assumptions one can reasonably make concerning the e, ; . 
From theoretical grounds it is reasonable to assume that 



var(e„) = — 



(2) 



where p, is the size, in assigned pairs, of the area. The quantity a 2 can 
be interpreted as the inherent variability from one assigned pair in one 
month, and the error term e„ results from averaging over p, assigned 



STATISTICAL ANALYSES OF COSTS 985 




15 20 

MONTH 



Fig. 14 — Values of ln(CPPAP + 1) before conversion. 

pairs. This assumption was supported by the analysis of Section III. 
Moreover, that analysis showed that the standard deviation (of the 
transformed CPPAP) does not seem to be related to any other available 
variable. 

Considering further assumptions concerning the distribution of the 
ey, it would be convenient, natural, and relatively simple if we could 
assume that the ey are independently normally (Gaussian) distributed 
with mean (and variance from eq. (2)). In support of these assumptions, 
it was shown in Section III that VpJ- the estimated e l} (i.e., (y v - — 
yi-) , '^ r Pi) were normally distributed after transformation. As for the 
independence assumption, these values were found in Section III to have 
a positive, although not extremely large, correlation between areas. 
However, the independence assumption between areas is important 
mainly for the confidence interval comparison of two different areas, 
as in eq. (5) below, and a positive correlation implies that that interval 
would tend to be conservative, i.e. longer than necessary. 

Thus, for purposes of the analysis we assume that the e t; are inde- 
pendent normal (0,<r 2 /p,). Thus the estimate £; in eq. (1) isy,.; i.e., the 
"true transformed cost penalty" is simply estimated by the average of 



986 THE BELL SYSTEM TECHNICAL JOURNAL, APRIL 1978 



40 



I* 



5* 



-20 



-40 



-3.5 




NORMAL PROBABILITY PLOT 
570 POINTS ON THE PLOT 



-2.5 



-0.5 0.5 

THEORETICAL QUANTILES 



1.5 



2.5 



3.5 



Fig. 15 — Values of ln(CPPAP + 1) before conversion. 

all observations for that area. Furthermore, 

o- 2 

var(y«.)= (3) 

Pi • mi 

where m, is the number of months of before conversion values available 

for area i. Confidence intervals for m (or m — nk) can be calculated using 

eq. (3) and standard normal theory. A 100(1 - a) percent confidence 

interval for m is 



yi- ± 2 



Vp~: 



(4) 



TTl; 



STATISTICAL ANALYSES OF COSTS 987 



-1 



-2 




NORMAL PROBABILITY PLOT 
253 POINTS ON THE PLOT 



-3.5 



-2.5 



-1.5 



-0.5 0.5 

THEORETICAL QUANTILES 



3.5 



Fig. 16 — Values of ln(CPPAP + 1) before conversion. 

where z is the upper 1 — a/2 quantile of the standard normal distribution 
and a is an estimate of a described below. (Alternatively a t distribution 
could be used, but the degrees of freedom used in estimating a should 
be large enough so that the difference in quantiles would be small.) 
Similarly, a confidence interval for the difference in "true" CPPAPs for 
two areas, m — nk, is 



(yt.-yk.) ±z 



\pi - mi 



-) 



1/2 



Pk • rrik I 
The estimate of a 2 , a 2 , is obtained from the regression 



(5) 



988 THE BELL SYSTEM TECHNICAL JOURNAL, APRIL 1978 




6 8 10 12 14 11 

DIST FROM CO (d ( ) 

Fig. 17 — Values of ln(CPPAP + 1) before conversion. 



(sy)i = 



V^i 



+ *i 



(6) 



where (sy)i is the observed standard deviation of the m, values in area 
i, and e, is an error term reflecting the departure of the observed (sy), 
from this model. Eq. (6) is obtained from eq. (2) and its use is supported 
by Fig. 9 and other analysis in Section III. The variance of e„ given in 
Section III, depends on i, so an iterated weighted regression is performed 
to obtain a. Our value is 12.40. Thus the variance is effectively estimated 
by pooling results across all areas, while allowing for the fact that dif- 
ferent sized areas have different variance. 

Up to this point all the work in this section has been on variables 
measured on the transformed scale, i.e., ln(CPPAP + 1). Recall this 
transformation was selected to reduce the dependence of the variability 
on the level of CPPAP and to improve normality. Therefore, confidence 
intervals are for parameters m, m* which are also transformed. However, 
we are interested in having tables (for example) based on the original 
data (untransformed) and representing untransformed parameters. This 
is simply done by forming the confidence intervals on the transformed 
scale and then performing the inverse transformation x = c y — 1. 

Shown in Table III are the 95 percent confidence intervals for various 



STATISTICAL ANALYSES OF COSTS 989 



<£ 




10 12 

DISTANCE TO CO (rf,-) 

Fig. 18 — Values of ln(CPPAP + 1) before conversion. 

observed values of the CPPAP calculated using eq. (4) and a estimated 
from the data. The time (in months) is the number of months used in 
forming the average value while the size is in pairs assigned. For example, 
suppose one has an area of 750 assigned pairs and has collected data for 
12 months. If the computed average CPPAP is $10, the confidence interval 
is from $7.47 to $13.29. If the computed CPPAP is $30, the interval is 
$22.87 to $39.26. The interpretation is that 95 percent of the time, an 
observed CPPAP will be such that the associated interval covers the 
"true" CPPAP. Note that these intervals are not symmetric. On the 
transformed scale the assumptions yield a symmetric interval. However, 
when transforming back to the original scale, the nonlinearity of the 
exponentiation results in asymmetric intervals. 

From the discussion of the variability of the average computed CPPAP 
it is clear that as the size of the area increases the variability decreases. 
Similarly, if the number of months used in computing the average CPPAP 
increases the variability of the estimate decreases. (In fact, based on eq. 
(4), and evident from Table III, the effects are symmetric.) To aid in 
assessing the magnitude of these effects Figs. 19 and 20 are provided. 
Figure 19 shows the upper and lower confidence limits for an observed 
CPPAP of $20 formed by averaging over 12 months, for various values 



990 THE BELL SYSTEM TECHNICAL JOURNAL, APRIL 1978 



CD 
O 

c 



;g 
•*— 
c 
o 
o 






ca 



o » 



a. 



OJ _■ 

Q. a) 
>». " 



G 






tMNN 



cd c— t- oo oo oo 



Oi iD o c- 00 oo 
Tt «o a> •«*; t-; o5 

id co cn cn cn h 



co c— c— t— oo oo 



CD «* CO CN CM CM 



05 05 C- t- 00 "* 
Ol CTJ ■** t> 02 ^_ 
iD CD t> t- t> 00 



ID <N 00 OS O CN 

iq <y> oo cm ex cd 
i> •*' co co oi cm 



ifi CO t^ t^ I s - t~ 



05 CO ■»* ■* CO CO 



■^ id co co t-~ c- 



co as c~ co id -^ 



co ->* id id co co 



CO CO CTl CN ID 00 



CO ID iD CO CO r- 



Hf-NOtNO 

q t-; co rn in co 

CO -"t ID CO CD CD 



CN CM C- CO i-l CD 
O O) CM CO C- CN 
CN C-^ CO ID Tf Ti< 
CO CN CN CN CN CM 



•— ID ID C— iD 
« CO CN i-4 C-; 

_ CN "*' ID 113 



I 

C 

CO ID 

Q. "* 



O 



•CN CD 

icqai 

) ID iD 



t> ID i-l ID C- C~ 
O CO ID CN t> i-H 

O CN CO Tf ** ID 



•<* Tl" CN CN 00 Oi 

03 cq ■«* o -»t CO 
© 00 ""* CN © oi 

ID CO CO CO CO CN 



t> O i-H CN CO CO 



CO CO CT5 CM ID 00 



CN C- Tji f- 00 CO 
tJ4_ 00 © t- <N CO 
O CN tJ4 - Tt ID ID 
CN CNCN CNCNCN 



00 CO CO CO ID c- 
"■* CN iH CT5 i-J lO 
ID O 00 CO CO iD 

■>* -^ co co co co 



a 
-a 



in 



HHt-CNCNt- 

O iD 00 t> CO t> 

00 --4 CN CO* ■* tJ.' 
i-H CN CN CN CN CN 



£ OICDCM CO cooo 

£ <N 00 Gi CN r-4 CO 

CD »-4 CO O CTS 00 t> 

Q, ID ■* rj4 CO CO CO 



O 



C^ i-H OS iD 00 CO 

cq oq cn c- # -* oq 

ID t-^ i-4 l> ID CO 

t- id id Tji -^ -* 



iior-cocno 



CO CO OJ CM ID 00 



STATISTICAL ANALYSES OF COSTS 991 




5.0 7.5 10.0 

SIZE IN ASSIGNED PAIRS (100's) 



15.0 



Fig. 19 — Upper and lower confidence interval. 

of the size. Both the asymmetry and the decrease in the size of the con- 
fidence interval are evident. Note that for the smaller areas the effect 
of the asymmetry is greater. Figure 20 is the same type of plot for an 
observed value of $30 of CPPAP for an area with 250 assigned pairs for 
differing numbers of months. Note that for this very small area, the 
confidence limits are quite wide and the effect of the asymmetry is much 
greater than that seen in Fig. 19. 

Table III can be used to help decide an appropriate size for allocation 
areas and an appropriate length of time for data collection. For a given 
size and time, the confidence intervals for various observed values of 
CPPAP can be read from Table III. For example, if allocation areas are 
created of size 500 assigned pairs or larger, and if data are collected for 
12 months or longer, then an observed CPPAP of $20 would give a con- 
fidence interval of $14.25 to $27.92 — or a shorter interval if the area is 



992 THE BELL SYSTEM TECHNICAL JOURNAL, APRIL 1978 




3 6 9 12 

TIME IN MONTHS 

Fig. 20 — Upper and lower confidence interval. 

larger or the data collection period longer. If the uncertainty in the "true" 
CPPAP represented by this interval is acceptable, then allocation areas 
could be sized to a minimum of 500 pairs with data collection for a 
minimum of 12 months. The uncertainty resulting from alternative 
values of size and time can be checked in this way using Table III. When 
forming allocation areas in a district and determining the length of time 
for data collection, the minimum size and time should be chosen so as 
to produce results precise enough for the decision making needs of the 
district. 

4.2. Extending results to Individual areas 

The basic results presented in Table III are given for only three values 
of the measured CPPAP, six different collection periods, and six area sizes. 
The first and most straightforward extension of this analysis to different 
areas and collection periods involves extending the tables using eq. (4) 
or by linear interpolation of the given table values. As can be seen from 



STATISTICAL ANALYSES OF COSTS 993 



Figs. 19 and 20, any linear interpolation is more valid for the range of the 
table associated with longer collection times and larger collection areas. 
This is simply because the effect of the transformation is more linear for 
this range of values. 

In the event that users of CPPAP data feel that their areas are signifi- 
cantly different from the Prototype District, which is the basis of Table 
III, there are several ways in which this analysis can be modified. First, 
the constant associated with eq. (6) can be re-estimated using the tech- 
niques described in Section 4.1. While the estimation of the weights in 
the regression is somewhat more complicated than ordinary least 
squares, most commercially available statistical computation packages 
allow for this type of estimation. Having computed the constant which 
relates variability to size of area, it is a simple matter to generate tables 
analogous to Table III. 

However, the logarithmic transformation of CPPAP used here for 
analysis might not always satisfy the desired assumptions. In this case 
a more exploratory analysis should be undertaken. Unfortunately, such 
an analysis will require additional statistical computation and display. 
The sequence of steps discussed in Section III can serve as a guide for 
the analysis, and for checking the appropriateness of various transfor- 
mations. Finally, it is possible that no appropriate transformation will 
be found. Then the method of analysis employed in this section will not 
be adequate. 

V. ANALYSIS OF AFTER CONVERSION DATA 
5. 1 . Description of analysis 

A major concern in the conversion of serving areas to SAC is whether 
or not the projected savings are being realized. To help answer this 
question the cost penalty data in the periods after conversion are ex- 
amined. A regression equation is developed which models the after 
conversion costs in terms of before and during conversion variables as 
well as the time since conversion. The most important result shows that 
the cost penalty continues to decline for the period immediately following 
conversion. The implication of these findings on conversion analysis is 
that to adequately assess the effect of conversion, cost data must be 
collected for a period of nine to twelve months after conversion. 

One might assume, a priori, that there will be differences in the con- 
verted areas but that such differences would not be related to the before 
or during conversion periods. These areas were all rehabilitated using 
the same FAP guidelines, so they should start off on the same footing. 
Differences might be related to installer productivity or activity, or 
geographic considerations of the areas. However, data on such variables 
are outside the scope of the Prototype District Data Base and are not 
currently available. It is of interest to know to what extent after con- 

994 THE BELL SYSTEM TECHNICAL JOURNAL, APRIL 1978 



version behavior might be explained, and the analyses of this section are 
directed at using variables available in the data base to this end. 

Since the logarithmic transformation of the before conversion data 
satisfied straightforward assumptions needed for analysis (see Section 
III), one might expect this transformation also to be reasonable for the 
after conversion data unless there are some "structural" changes in the 
after conversion period. Our analyses do not indicate any such change, 
so the quantity analyzed here is y = ln(CPPAP + 1). Ten of the 23 allo- 
cation areas were converted, and each of these areas has from 1 to 20 
months of after conversion data. The total number of values (areas • 
months) is 100. 

We search for a linear description of the 100 y 's of the form 

yij = a + aixuj + a 2 x 2 ij + . . . + a e x ei j + ejj (7) 

where i denotes area; ; denotes month; X\ is some descriptive or ex- 
planatory variable with value xuj for the ith area and ;th month; simi- 
larly for x% . . . , xg\ and e i; is the residual which is unexplained, and 
which should not be related to any available variable. In accord with the 
analysis in Sections III and IV, we assume that var(e, ; ) = a 2 /pi. Thus, 
all regressions discussed here are weighted regressions with weights in- 
versely proportional to the square roots of these variances. The problem 
is to find a good but parsimonious set of variables X\, %% . . . , X£. 

5.2. Fitted regression equation 

Three classes of potential descriptive variables x are considered. First 
are variables which give some characteristics of areas, where these 
characteristics can be observed before the after conversion period. Such 
a variable has a fixed value for each area (i) across months (J). Examples 
include the distance of an area from the central office, the size of an area, 
and the average cost penalty for an area before conversion. The second 
class of variables concerns seasonal cycles across months. Such a variable 
has a value depending on the months (J) but is constant for each area 
(i). The third class consists of the single variable giving the number of 
months since conversion of that area; thus xy = k, where month ;' is k 
months past the conversion date of area i. 

Consider the first class of variables. The most powerful such variables 
would be a set of 10, with each variable having some non-zero value in 
one area and the value zero in all other areas. This gives a one-way 
analysis of variance model, with the area corresponding to the treatment 
or groups. Doing this, one obtains an R 2 = 0.28. This means that 28 
percent of the variation in the y's can be explained by differences be- 
tween the areas. 

The fit is improved substantially (R 2 = 0.37) by adding to this model 
the variable which measures the number of months since conversion. 

STATISTICAL ANALYSES OF COSTS 995 



However, the further addition of variables allowing different values for 
different months — the seasonal or time effect variables — improves the 
fit only negligibly. Thus, use of all the variables available here would 
result in a model describing about 40 percent of the variability in the 
after conversion values. Although this is not a large percentage on an 
absolute basis, it is also not negligible, especially considering that this 
is variability over months and areas after conversion to SAC. 

Now we would like to go further and discover specific characteristics 
of the ten areas and specific variables that would give a simpler but still 
relatively good descriptive model. The following eight variables mea- 
suring characteristics of the areas were considered: the size of an area, 
as measured by the number of assigned pairs; distance to central office 
along feeder cable, measured in kilofeet; area mean before conversion; 
area standard deviation before conversion; area mean during conversion; 
area standard deviation during conversion; number of months of before 
data available; and number of months during conversion. The above 
one-way analysis of variance implies that the maximum descriptive 
power of any subset, or transformations, of these variables is 28 per- 
cent. 

In order to find a small but good set of variables and transformations, 
extensive regression analyses were done, including stepwise calculations 
and C p analysis. 10 As is often the case in such problems, no small set of 
variables clearly stands out as the unique "best" regression equation. 
Correlations between explanatory variables can permit several different 
sets of variables to fit the data approximately equally well. We will now 
discuss one simple model that does fit these data reasonably well. 

Variables included in the model are the following: number of months 
since conversion; during conversion mean; during conversion standard 
deviation; and number of months before conversion. The fitted regres- 
sion equation is summarized in Table IV, which gives the regression 
coefficients, the estimated standard errors, and the t -values for testing 
each coefficient equal to zero. The J? 2 is 0.35 with residual standard error 
of 0.44, compared to a standard deviation of 0.54 for the dependent 
variable. Thus, use of only four variables gives a fit nearly as tight as can 
be obtained when using all possible explanatory variables available here. 

Table IV — Fitted equation for after conversion data* 

yu= 1.60 -0.044xi +0.44x 2l -1.13x 3 « -0.038x 4 « 
Standard error 0.36 0.009 0.14 0.37 0.010 

t -statistic 4^41 -4.89 3A9 -3.07 -3.80 

* ytj = ln(CPPAP + 1) 

x uj = number of months since conversion 

*2j = during conversion mean 

*3,- = during conversion standard deviation 

X4i = number of months before conversion 

(x2, X3, and X4 are all the same over all months; hence, the time subscript; is omitted.) 

996 THE BELL SYSTEM TECHNICAL JOURNAL, APRIL 1978 



No monthly time variable or cyclic time effect is included, since the 
analysis showed that they had no additional explanatory power. 

Examination of various residual plots is important in determining the 
adequacy of this fit. Figure 21 gives a partial residual plot 11 for the 
number of months since cut-over (xi) variable. The variable plotted on 
the vertical axis is the residual from the regression fit plus the contri- 
bution from this variable. Thus, one expects the points to scatter about 
a straight line with slope equal to the regression coefficient for xi, here 
-0.044. This figure does not suggest any serious inadequacy in the fit 
as far as this variable is concerned. Partial residual plots and residual 
plots for the other variables, normal q-q plots, and various box plots of 
the data and residuals were also examined. They did not show anything 
particularly noteworthy. 

Consider the interpretation of the variables in the fitted equation. For 
variable Xi, the number of months since cut-over, it is not surprising that 
the level declines over time after the conversion is completed, since 
unknown cable troubles and defective pairs will be discovered and cor- 
rected. Figure 21, introduced above, shows graphically that there is a 
steadily decreasing trend as the number of months since cut-over in- 
creases. There is not an instantaneous decline to a low, constant level. 




5 10 15 

NUMBER OF MONTHS AFTER CONVERSION 



Fig. 21 — Values of ln(CPPAP + 1) after conversion. 

STATISTICAL ANALYSES OF COSTS 997 



Moreover, this variable (x{) appears with approximately the same neg- 
ative coefficient in all "reasonably fitting" sets of variables, while other 
individual variables are not so strongly needed in order to obtain an 
adequate fit. For variable x 2, the during conversion mean, it seems rea- 
sonable that a higher during conversion period (a proxy for the com- 
plexity of the conversion activity) will be associated with a larger after 
conversion level. However, the interpretations for the during conversion 
standard deviation (0:3) and the number of months before conversion 
(X4) are not as straightforward. For example, one could speculate that 
areas with a high level of during variability have spots of local congestion 
causing occasional high costs (i.e., RE's LST's, WOL's, etc.). A large 
standard deviation implies that there are also months in which costs are 
low. It is just this type of area that can show large savings (and lower 
values of CPPAP) after conversion via FAP. The number of months before 
conversion could be a proxy for the ranking of the converted areas. 
Presumably, the worst areas would be converted earlier. Hence, the 
better areas are converted later and the post conversion costs of the 
better areas are lower (other things being equal). 

VI. ACKNOWLEDGMENT 

We are indebted to many members of Department 4511 for their time 
and effort in explaining concepts and issues pertaining to the loop plant. 
Special thanks are due Nancy Basford who never failed to respond 
helpfully to our many queries. 

REFERENCES 

1. G. W. Aughenbaugh and H. T. Stump, "The Facility Analysis Plan: New Methodology 

for Improving Loop Plant Operations," B.S.T.J., this issue. 

2. J. 0. Bergholm, private communication. 

3. J. 0. Bergholm and P. P. Koliss, "Serving Area Concept — A Plan for Now With a Look 

to the Future," Bell Laboratories Record, 50, No. 7 (August 1972), pp. 212-216. 

4. N. L. Long, "Loop Plant Modeling: Overview," B.S.T.J., this issue. 

5. G. W. Aughenbaugh, N. L. Basford, D. M. Dunn, A. E. Gibson, and J. M. Landwehr, 

private communication. 

6. A. E. Gibson, personal communication. 

7. H. P. Friedman, et al., "A Graphical Way of Describing Changing Multivariate Pat- 

terns," Proc. of the Comp. Science and Statistics 6th Annual Symposium on the 
Interface (1972), pp. 56-59. 

8. J. W. Tukey, Exploratory Data Analysis, Reading, Mass.: Addison-Wesley Publishing 

Co., 1977. 

9. M. B. Wilk and R. Gnanadesikan, "Probability Plotting Methods for the Analysis of 

Data," Biometrika, 55 (1968), pp. 1-17. 

10. C. L. Mallows, "Some Comments on C p ," Technometrics, 15 (November 1973), pp. 

661-676. 

11. W. A. Larsen and S. J. McCleary, "The Use of Partial Residual Plots in Regression 

Analysis," Technometrics, 14 (August 1972), pp. 781-790. 



998 THE BELL SYSTEM TECHNICAL JOURNAL, APRIL 1978